This paper proposes a simple yet effective approach to learn visual featuresonline for improving loop-closure detection and place recognition, based onbag-of-words frameworks. The approach learns a codeword in bag-of-words modelfrom a pair of matched features from two consecutive frames, such that thecodeword has temporally-derived perspective invariance to camera motion. Thelearning algorithm is efficient: the binary descriptor is generated from themean image patch, and the mask is learned based on discriminative projection byminimizing the intra-class distances among the learned feature and the twooriginal features. A codeword for bag-of-words models is generated by packagingthe learned descriptor and mask, with a masked Hamming distance defined tomeasure the distance between two codewords. The geometric properties of thelearned codewords are then mathematically justified. In addition, hypothesisconstraints are imposed through temporal consistency in matched codewords,which improves precision. The approach, integrated in an incrementalbag-of-words system, is validated on multiple benchmark data sets and comparedto state-of-the-art methods. Experiments demonstrate improved precision/recalloutperforming state of the art with little loss in runtime.
展开▼